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Abstract 

The  “visual  motion”  problem  consists  of  estimating  the  motion  of  an  object  viewed  under 
projection.  In  this  paper  we  address  the  feasibility  of  such  a  problem. 

We  will  show  that  the  model  which  defines  the  visual  motion  problem  for  feature  points 
in  the  euclidean  3D  space  lacks  of  both  linear  and  local  (weak)  observability.  The  locally 
observable  manifold  is  covered  with  three  levels  of  lie  differentiations.  Indeed,  by  imposing 
metric  constraints  on  the  state-space,  it  is  possible  to  reduce  the  set  of  indistinguishable  states. 

We  will  then  analyze  a  model  for  visual  motion  estimation  in  terms  of  identification  of 
an  Exterior  Differential  System,  with  the  parameters  living  on  a  topological  manifold,  called 
the  “essential  manifold”,  which  includes  explicitly  in  its  definition  the  forementioned  metric 
constraints.  We  will  show  that  rigid  motion  is  globally  observable/identifiable  under  perspective 
projection  with  zero  level  of  lie  differentiation  under  some  general  position  conditions.  Such 
conditions  hold  when  the  viewer  does  not  move  on  a  quadric  surface  containing  all  the  visible 
points. 


1  Introduction 

Animals  face  everyday  tasks  which  require  the  ability  to  estimate  the  relative  motion  between  them 
and  the  objects  populating  the  environment  (or  the  environment  itself),  such  as  walking,  avoiding 
obstacles,  grasping  objects  etc.  .  Only  recently,  dynamic  estimation  and  control  techniques  have 
given  encouraging  results  for  designing  automatic  systems  which  mimic  such  abilities  [13,  12,  20, 
21,  3,  26]. 

If  we  restrict  our  attention  to  motions  inside  a  “static  scene”,  then  the  rigid  motion  constraint 
and  the  perspective  projection  map  define  a  nonlinear  dynamical  model.  Motion  estimation  may  be 
formalized  in  terms  of  parameter  identification  and/or  state  estimation  of  such  a  model.  Tradition¬ 
ally  the  estimation  task  has  been  performed  using  an  Extended  Kalman  Filter  (EKF)  [31,  8,  29]. 

In  this  paper  we  address  the  feasibility  of  estimating  the  motion  of  a  rigid  object  from  perspective 
observations  using  a  dynamic  model. 

’Research  funded  by  the  California  Institute  of  Technology,  ONR  grant  N00014-93- 1-0990  and  an  AT&T  Founda¬ 
tion  Special  Purpose  grant.  This  work  is  registered  as  CDS  Technical  Report  CIT-CDS  94-001,  California  Institute 
of  Technology,  January  1994  -  revised  February  1994.  Submitted  to  the  invited  session  on  “Dynamic  Vision,  System 
Theoretical  Methods  and  Control  Applications”  at  the  33rd  IEEE  conf.  on  Decision  and  Control,  Florida,  1994. 
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A  crucial  issue  in  dynamic  estimation/identification  is  the  observability  of  the  model,  or  the 
identifiability  of  its  parameters.  We  will  see  that  the  model  which  “defines”  the  visual  motion 
problem  for  feature  points  in  the  euchdean  3D  space  is  neither  finearly  observable  nor  locally 
weakly  observable.  It  is  possible,  as  we  wiU  see,  to  reduce  the  set  of  locally  indistinguishable  states 
by  imposing  metric  constraints  on  the  state  space;  however,  the  model  suffers  some  structural 
Umitations  which  make  the  local-Unearization  based  methods  porely  conditioned  and  not  robust 
enough  to  be  used  in  real  world  apphcations. 

Rigid  motion  is  indeed  globally  observable  from  perspective  projections,  once  the  problem  is 
formulated  in  the  appropriate  topological  space.  In  this  paper  we  analyze  a  new  formulation  for 
motion  estimation  [47]  in  terms  of  identification  of  an  Exterior  Differential  System  [7]  with  the 
parameters  living  on  a  topological  manifold,  called  the  “essential  manifold”.  Using  some  results 
from  the  computational  vision  hterature,  we  show  that  this  model  is  globally  observable  without 
any  lie- differentiation  under  general  position  conditions.  Such  conditions  are  met  when  the  object 
and  the  path  of  the  center  of  projection  cannot  be  imbedded  in  a  quadric  surface  [15,  37],  and  can 
be  verified  using  a  simple  rank  test. 

1.1  Existing  literature  and  relations  to  previous  work 

The  use  of  dynamic  observers  for  estimating  scene  structure  and/or  motion  dates  back  to  the 
eighties  [16,  6,  36,  23].  These  works  assumed  that  either  the  viewer’s  motion  or  the  shape  of  the 
object  was  known.  More  recent  works  deal  with  the  estimation  of  both  structure  and  motion 
recursively  from  perspective  projections  [24,  5,  42,  1,  49]  using  an  EKF.  The  schemes  are  based 
upon  minor  variations  of  the  same  model,  and  none  of  them  addresses  the  issue  of  its  observability. 
There  is  a  also  a  vast  hterature  on  stereo-motion  and  on  batch  schemes  for  recovering  structure 
and  motion;  for  an  extensive  review  see  [14]  and  references  therein. 

Our  work  is  somehow  orthogonal  to  [17,  11],  in  what  they  assess  the  feasibihty  of  structure 
estimation  for  known  motion.  We  study  instead  the  problem  of  motion  estimation  for  unknown 
structure.  Once  motion  is  known,  structure  is  hnearly  observable  from  the  rigid  motion  model. 
Therefore  if  we  estimate  motion,  and  the  estimates  are  properly  weighted  through  the  second  or¬ 
der  statistics  of  the  estimation  error,  any  “structure  from  motion”  module  incorporating  motion 
error  (for  example  [42,  49])  may  be  used  for  estimating  scene  structure  reliably.  Note  that  motion 
and  structure  play  interchangeable  roles:  when  either  one  is  known,  the  other  can  be  uniquely  de¬ 
termined.  However,  motion  can  be  reconstructed  independent  of  structure,  and  structure  estimates 
play  a  role  only  in  disambiguating  multiple  solutions  and  in  propagating  scale  information  across 
time. 

The  model  described  in  the  last  part  of  this  paper  is  inspired  by  [34],  and  the  results  used  for 
proving  its  observability /identifiability  are  from  [37,  15]. 

1.2  Organization  of  the  paper 

In  section  2  we  define  and  formalize  the  visual  motion  problem,  first  in  full  generahty,  and  then 
restricted  to  the  case  in  which  the  scene  is  described  by  a  number  of  “feature  points”  in  the  euclidean 
space.  In  this  case  the  motion  problem  is  defined  by  the  rigid  motion  constraint  and  the  perspective 
projection  map. 

In  section  3  we  show  some  alternative  formulations  of  motion  estimation  in  terms  of  inver¬ 
sion/estimation  or  identification/estimation  of  a  nonhnear  dynamical  model.  Motivated  by  the 
limitations  of  such  models,  we  reformulate  the  problem  in  terms  of  state  estimation  of  a  nonlinear 
model  defined  on  a  Unear  state-space,  raising  the  issue  of  observabiUty. 
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In  section  4  we  address  the  linear  observability  and  local  (weak)  observability  of  the  model 
which  defines  the  motion  problem  in  the  case  of  feature  points  in  the  euclidean  3D  space.  We 
show  that  it  is  neither  linearly  observable  nor  locally  observable.  We  show  how  to  reduce  the  set 
of  indistinguishable  states  by  imposing  metric  constraints  on  the  state  space.  However,  the  local 
observability  codistribution  reaches  fuU  rank  after  three  levels  of  bracketing,  indicating  that  the 
model  is  hardly  observable. 

In  section  5  we  will  describe  a  formulation  of  motion  estimation  as  identification  of  a  systems 
in  exterior  differential  form,  with  the  parameters  living  in  a  topological  manifold,  which  we  caU  the 
“essential  manifold” .  The  model  is  globally  observable  with  no  level  of  differentiation  once  general 
position  conditions  hold.  Such  conditions  are  characterized  using  a  simple  rank  test. 

2  Visual  motion  and  structure  estimation 

So  far  we  have  discussed  about  scene  structure  and  motion  estimation  without  referring  to  a  specific 
scene  (or  object)  representation.  In  a  systems-theoretic  framework,  the  structure  of  an  object 
is  encoded  by  a  finite  number  of  “feature  points”  on  a  differentiable  manifold.  These  could  be 
the  coefficients  of  polynomial  curves  or  splines  fitting  the  contours  of  the  object,  or  parameters 
describing  locally  its  surface  or,  in  the  simplest  case,  points  in  the  euclidean  3D  space.  In  such  a 
case  the  “structure  manifold”  is  simply  and  the  features  correspond  to  salient  regions  of  the 
object,  as  for  example  visible  corners.  In  what  follows  we  assume  we  are  given  the  “correspondence” 
of  projection  of  feature  points  across  time,  i.e.  at  each  time  instant  we  measure  the  projection  of 
each  feature  point  onto  the  image  plane  (or  retina)  and  we  know  which  point  corresponds  to  which 
across  time  (for  a  review  of  some  available  methods  for  solving  the  correspondence  problem  see  [2]). 

Since  we  are  interested  in  relative  motion,  it  is  equivalent  to  assume  that  the  viewer  is  moving 
inside  a  static  scene,  carrying  along  his  reference  frame  (viewer-centered  representation)  or  that 
the  scene  is  a  rigid  object  which  is  moving  in  front  of  the  viewer  together  with  its  reference  (object- 
centered  representation). 

Viewer-centered  representation  The  viewer  maintains  a  local  coordinatization  of  the  feature 
space,  which  changes  in  time  as  he  moves  inside  the  scene.  Meanwhile  he  perceives  the 
projection  of  the  scene  onto  its  sensor  (retina  or  CCD),  which  corresponds  to  measurements 
of  a  time  invariant  projection  of  the  feature  manifold  in  the  motion-dependent  (viewer-based) 
coordinatization. 

If  the  viewer  moves  inside  a  static  scene,  his  motion  between  two  time  instants  is  described  as 
a  point  in  SE{S),  the  group  of  rigid  transformations  of  R®  [39].  In  this  case  we  may  imagine 
the  viewer-centered  coordinates  as  a  family  of  dilfeomorphisms  parametrized  by  points  of 
SE{S).  If  the  viewer  moves  with  constant  velocity,  motion  is  represented  by  a  single  point, 
otherwise  he  describes  a  whole  path  in  SE{S). 

Object-centered  representation  Alternatively  we  may  imagine  a  “static”  coordinate  parame- 
trization  of  the  feature  space  and  the  viewer  measuring  the  output  of  a  projective  map  which 
is  time- varying  according  to  the  motion  of  the  scene.  If  the  scene  is  moving  rigidly,  the  viewer 
measures  the  output  of  a  family  of  maps  from  the  feature  manifold  onto  an  appropriate 
real-projective  space,  parametrized  by  points  in  SE{3). 

These  two  viewpoints,  object-centered  and  viewer-centered,  are  substantially  equivalent,  since  they 
may  be  transformed  one  into  the  other  via  a  motion  dependent  diffeomorphism;  however,  each  has 
its  own  advantages,  depending  on  the  application. 
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In  both  cases  the  visual  motion  problem  consists  in  estimating  the  path  in  SE{3)  describing 
the  relative  motion  between  the  viewer  and  the  object. 

2.1  Formalization  of  the  model 

Let  us  call  N  the  feature  n-manifold,  p  E  N  a  feature  point;  M  is  the  motion  manifold,  q  E  M 
represents  motion, 


w.N 

p  y  z=  7r(p) 

represents  a  perspective  projection.  Let  N  and  p  denote  the  local  coordinates  correspondents  of  N 
and  p.  We  suppose  there  exists  a  set  of  compatible  charts  on  N: 

(t>:N  N 

P  P~  4>ip)- 

The  visual  motion  problem  is  formalized  in  the  viewer-centered  representation  by  a  family  of 
maps  on  N  parametrized  hy  q  E  M: 

I  Pit  + 1)  =  fiP{t),  g) ;  pe  N,qEM 

\y  =  Trip) 

where  /  is  the  map  encoding  rigid  motion;  fip,q)  =  pqip),  and 

4>q:NxM  N 

ip,q)  ^  P  =  Mp)  ;  q  ^  M 

is  the  viewer-centered  local-coordinates  chart  of  N,  which  varies  in  time  as  the  viewer  moves  with 
qit).  Alternatively  motion  may  be  represented  by  its  velocity:  li  q  E  M  -  SEi3),  M  =  TeSEiZ)  = 
se(3)  is  the  He  algebra  of  twists  corresponding  to  rigid  motions.  In  this  case  we  formalize  the  visual 
motion  problem  using  a  family  of  vector  fields  on  N: 

fp  =  fiP,q) ;  PeN,qEM 

\y  =  7r(p). 

The  object-centered  representation  corresponds  to  the  model 

f  pit  +  1)  =  Pit) 

\y  =  T^gip) 

where 

TCg-.NxM  ^ 

ip,q)  ^  Tcoc))-^  o  f{4>{p),q). 
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Point-based  visual  motion 


Let  us  now  consider  a  simple  paradigm,  in  which  p  ^  M  =  is  a  salient  point  in  the  scene, 
X  =  [X,  F,  Z]^  are  its  coordinates  with  respect  to  an  orthonormal  reference  frame  centered  in  the 
pupil  of  the  viewer,  with  the  Z  axis  pointing  forward  and  X,  Y  arranged  as  to  form  a  right-handed 
frame.  Let  q  =  [V  ,  €  se(3)  represent  the  canonical  screw  coordinates  of  the  rigid  motion  of 

the  viewer  (body  velocity  [39]).  As  the  viewer  moves,  each  point  describes  a  vector  field  on  in 
the  viewer-centered  representation,  /(p,  q)  is  simply 

/(X,  Vit),  n(t))  =  Q(t)  A  X  +  V(t). 

If  we  represent  motion  between  two  time  instants  t  and  t  +  r,  and  velocity  is  held  constant  between 
the  two  samples,  we  have 

/(X,  T(t,  r).  Hit,  t))  =  Hit,  r)X  +  T{t,  r) 
where  (T,  R)  are  related  to  (V,  0)  via  [39] 

R(t,T)  = 

A +!i(i)n(<fr]F(<). 

In  what  follows  we  wiU  assume  r  =  1  (constant  sampling  rate). 

The  map  tt  is  the  trivial  association  of  each  p  ^  0  with  its  projective  coordinates  as  an  element  of 
1RP2: 


(1) 

(2) 


p  ^  \p]. 

In  summary,  if  we  represent  the  scene  structure  using  points  in  the  euclidean  3D  space,  the  visual 
motion  problem  is  defined  by  the  constraints  of  rigid  motion  and  perspective  projection.  For 
instance,  in  the  viewer- centered  instantaneous  representation  we  have 

fX  =  fiAX  +  F;  X(0)  =  Xo 
lx  =  7r(X), 

where  x  =  [x,y,  1]^  are  the  coordinates  of  the  projection  of  the  point  X  onto  the  image  plane. 

3  Systems-theoretic  characterization  of  visual  motion  estimation 

We  have  seen  that,  if  the  scene  is  represented  by  a  set  of  n  points  in  3D  space  moving  rigidly 
with  respect  to  the  viewer,  the  visual  motion  problem  is  defined  by  the  rigidity  constraint  and  the 
perspective  projection  equations.  If  X,-  are  the  coordinates  of  the  point  and  x;  the  corresponding 
projection,  we  may  write 

rX,  =  0AX,  +  F  X(0)  =  Xo  Vi  =  l:n  (3) 

1  Xi  =  7r(X;)  -h  Vi  Vi  e  M(0,  Ri) 

where  Vi  represents  an  error  in  measuring  the  position  of  the  projection  of  the  point  i.  Solving 
the  visual  motion  problem  consists  in  estimating  Xi,V  and  fl  for  all  the  visible  points  i,  i.e. 
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Motion  reconstruction  via  state  estimation 


* . . .  1 

•  Motion  reconstruction  , 

[  via  inversion/estimation  i 

V’  a-'  1  (V  Q) 

I 

1 

I  X. 

1 

Xi 

1 

1 

1 

1 

1 

I 

1 

Figure  1:  Interpretation  of  rigid  motion  reconstruction  as  inversion/estimation  or  identifica¬ 
tion/estimation  of  a  dynamic  model.  The  problem  is  transformed  into  a  state  estimation  task 


reconstructing  both  the  input  and  the  initial  state  of  the  above  system  from  its  noisy  output  (see 
fig.  1).  Alternatively  motion  may  be  viewed  as  a  vector  of  unknown  parameters  in  the  model  (3) 
which  have  to  be  identified. 

In  the  present  section  we  show  that  it  is  possible  to  invert /identify  the  above  system  and  solve 
for  motion  in  a  least-squares  fashion;  the  solution  is  analogous  to  that  found  in  [22].  However,  the 
structure  of  the  inverse  system  is  intrinsically  instantaneous,  since  the  original  model  is  driftless, 
and  hence  it  does  not  exploit  the  benefits  of  dynamic  observers. 


3.1  Motion  reconstruction  via  inversion/identification  of  a  nonlinear  model 


Motion  estimation  may  be  viewed  as  an  inversion  problem  for  the  model  (3)  when  the  initial 
state  (structure)  is  unknown.  It  is  well  known  that  under  certain  conditions  on  the  relative  degree, 
it  is  possible  to  invert  a  nonlinear  system  [28].  In  order  to  do  that,  we  compute  lie  derivatives  of  the 
output  along  the  state  vector  fields  until  the  components  of  the  input  appear.  If  the  coupling  matrix 
is  nonsingular,  we  may  invert  it  and  reconstruct  the  input  of  the  system  from  bracket  combinations 


of  its  output. 

In  our  case  the  model  is  driftless,  and  therefore  aU  the  components  of  the  input  appear  at  the 
first  level  of  differentiation: 


k(t)  = 


v4(x)  H(x) 


V{t) 

n{t) 


where 


A  = 


1  0  —X 
Q  I  -y 


-xy  1  +  x^  -y 

— 1  —  xy  X 


(4) 


If  we  observe  enough  points,  we  have  an  overdetermined 
parameters  in  a  least-squares  sense.  Call  Ci  = 


system  which  we  may  solve  for  the  motion 
,  we  have 


r  •  1 

t 

Xl 

t .  1 

— 

Ci 

X2 

■  •  - 

=  C+x 


where  the  symbol  f  denotes  the  pseudo-inverse.  Note  that  Ci  depends  on  the  depth  of  the  point, 
Zi,  which  we  do  not  know. 
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In  order  to  reconstruct  the  initial  depth  it  is  necessary  to  observe  it.  Dynamic  observers  are  in 
essence  computing  differentiations  of  the  output  until  the  matrix  which  couples  the  initial  condition 
and  the  derivatives  of  the  output  (observability  matrix  or  observability  codistribution)  has  full  rank. 
In  our  case,  however,  both  the  input  and  the  initial  state  appear  at  the  same  level  of  differentiation, 
since  the  model  is  driftless.  Therefore  we  may  hope  to  recover  either  motion  or  depth  with  such  a 
technique,  but  not  both. 

It  is  still  possible,  however,  to  exploit  the  above  constraint  to  recover  rigid  motion  by  solving  a 
nonlinear  optimization  problem  constrained  to  a  linear  subspace.  Such  a  problem  may  be  formulated 
as  the  identification  of  an  Exterior  Differential  System,  as  it  is  done  in  [46].  See  [22,  46]  for  more 
details  on  this  formulation. 


3.2  Motion  reconstruction  as  state  estimation 

Because  the  model  described  in  the  previous  section  has  no  drift  dynamics,  left-inversion/state- 
estimation  reduces  to  a  static  (instantaneous)  procedure  and  hence  it  does  not  exploit  the  noise 
rejection  properties  of  dynamic  observers. 

One  possible  way  to  proceed,  based  on  the  above  considerations,  is  to  introduce  some  dynamics 
into  the  problem,  using  “dynamic  extension”.  Instead  of  considering  motion  as  the  input  of  the 
system,  we  consider  as  input  its  time  derivative,  and  insert  motion  into  the  state  dynamics. 

If  we  view  motion  estimation  as  identification  of  the  model  (3),  this  “dynamic  extension”  corre¬ 
sponds  to  transforming  the  identification  problem  into  a  state  estimation  task.  This  has  been  done 
in  the  literature  of  recursive  identification  since  the  early  sixties  (see  for  example  [44,  10,  30]  and 
references  therein).  In  essence  the  parameters  to  be  estimated  are  inserted  into  the  state  dynamics 
and  a  Kalman  filter  is  used  as  a  parameter  estimator  (see  figure  1). 

In  both  interpretations  we  are  led  to  an  augmented  model  with  the  input  being  inserted  into 
the  state  dynamics: 


'  Xi  =  QAXi  +  V  Xi(0)  =  X,o 

y  =  fv(y,vv)  no)-vb 

^  fl(0)  =  Do 

,  Xi  =  7r(Xi)  -f  Oi 


(5) 


Since  we  do  not  know  fv  and  Jq  ,  the  visual  motion  problem  may  be  formulated  as  an  “unknown- 
input/state  estimation”  problem.  However,  we  may  have  some  quaUtative  information  about  fv-,  fu 
which  we  want  to  exploit,  for  example  a  dynamical  model,  when  the  camera  is  mounted  on  a  moving 
vehicle. 

In  absence  of  such  information,  fv  and  fa  may  describe  a  statistical  model.  The  simplest  case 
is  fy  =  f^  =  0,  which  corresponds  to  constant  velocity  (or  “small  acceleration”).  A  model  often 
used  is  brownian  motion:  fv  =  vv,  fu  =  vu  are  white,  zero- mean  gaussian  noises^  whose  variances 
are  to  be  considered  as  tuning  parameters.  Note  that  the  random  walk  model  allows  us  to  deal 
with  time-verying  parameters  (when  the  variation  is  slow  compared  to  the  sampling  period)  and 
also  the  positive  definite  variance  of  the  model  error  helps  preventing  saturation,  when  a  Kalman 
filter  is  used  to  perform  the  estimation  task. 

A  crucial  issue  in  state  estimation  using  observers  is  of  course  observability  of  the  model,  which 
we  address  in  the  following  section. 

^This  notation  is  incorrect,  since  white  noise  is  not  integrable;  indeed  the  notation  is  customary  in  statistical 
estimation,  and  we  will  adopt  it  here. 
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4  Perspective  local  observability  of  rigid  motion 

In  this  section  we  study  the  local  observability  of  the  model  (3).  We  show  that,  in  the  case  of 
constant  velocity  (or  small  acceleration),  it  is  not  locally  observable.  However,  by  enforcing  metric 
constraints  on  the  state  space,  it  is  possible  to  reduce  the  set  of  indistinguishable  states.  Some 
definitions  and  standard  results  on  local  observability  may  be  found  in  appendix  A. 

4.1  Linear  observability 

We  consider  the  linearization  of  the  model  (5):  define  A  =  |^  C  =  and  A  as  in  (4).  Suppose 
for  simplicity  re  =  1;  it  is  immediate  to  see  that 

'(OA)  h  -(XA)' 

A  =  0  0  0 

0  0  0  _ 

C  =  [  iA  0  0  ] 

CA‘  =  (OA)‘  (fiA)*-i  -(fiA)*-i(XA)  '  . 

Z/  ^ 

The  observability  matrix  for  the  linearized  system  is  hence 

^A  0  0 

^  |A(flA)  ^A  -^Ai'Kh) 

_  iA(ftA)»  ^A{QAy  -|A(fiA)^(XA)  _ 

The  above  reduces,  for  planar  motion,  to  the  matrix 


J. 

— -^Oy 

X 

-w 

-ifiy 

A  q2 

0 

1  ^ 

-i  -  25 

0 

—  ^Oy 

0 

X 

— 

0 

^2“y 

J-Q3 

ZZY 

^Y  + 

^q3 

-A-03 
—  22  «"y 

0 

l-Oy 

which  is  easily  seen  to  have  rank  3,  since  the  last  two  rows  are  a  scalar  combination  of  rows  2  and 
3  (and  so  are  the  subsequent  rows).  With  a  similar  procedure  the  fuU  matrix  O  is  shown  to  have 
rank  5,  in  face  of  a  state  space  of  dimension  9.  The  linearized  system  is  therefore  not  observable, 
and  we  say  that  the  original  model  is  not  linearly  observable. 

4.2  Local  observability 

The  local  observability  space  is  defined  as  the  set  of  the  output  functions  and  all  their  possible 
lie  derivatives  along  vector  fields  in  the  accessibility  algebra  [28].  Under  slow  acceleration  condi¬ 
tions,  the  vector  field  in  (3)  is  autonomous,  and  therefore  the  observability  space  is  spanned  by 
{7r,T/7r, . .  .,Ty7r . . .}  where  /  denotes  the  state  vector  field.  The  observability  codistribution  is 
spanned  by  dO  =  {dh  \  h  ^  O}.  The  state  manifold  is  R®;  the  rank  of  the  observability  codistri¬ 
bution  reaches  its  maximum  of  8  after  three  levels  of  lie  differentiation. 

The  nuU  space  of  the  observability  codistribution,  in  case  of  nonzero  forward  translation,  is: 

}iumdh,dLfh,dL}h,dL}h])  =  Sv^n[^  ^  ^  ^  ^  1  0  0  o'. 
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In  the  case  of  zero  forward  translation,  after  four  levels  of  differentiation  we  have  a  nuU  space 
spanned  by 

In  case  of  only  lateral  translation,  the  nuU  space  of  the  observability  codistribution  is  spanned  by 

;  ^  1  0  0  0  0  o] 

when  translation  is  along  X,  while  for  translation  along  Y  we  have 


NuU([dh,dLfh,...,dLjh]) 


^^4-010000 


In  the  case  of  pure  rotation,  a  basis  of  the  null  space  of  the  observability  codistribution  is  obviously 

f|l000000 

and  aU  the  points  having  the  same  projective  coordinates  are  indistinguishable.  In  the  case  of 
nonzero  forward  translation,  the  set  of  states  which  are  indistinguishable  from  [Xq  Vq  Oq]^  is 


/ 

■  Xo  ■ 

■V  ^3  "1"® 

Vb 

=  { 

*^0  Vn 

1  s  G  IR}, 

\ 

) 

which  is  a  Unear  variety  of  dimension  one,  and  similarly  for  the  other  cases. 


4.3  Global  scale  ambiguity:  metric  constraints  on  the  state  manifold 

Consider  the  solution  Xi(t,  X^o,  Vo,  flo)  of  (5)  starting  from  the  initial  condition  X,o,Vb,  IIq: 

Xi(t)  =  (  +  r(fio,  t)Vo  if  IIOoll  #  0 

I  Vot  otherwise 

where 

It  is  immediate  to  see  that  Xj(t,0!Xio,aI^o,^^o)  =  aXi{t,Xio,Vo,^o)  V  Xjo,  Vb,  «•  Since 
for  central  projection  we  have  Xi{t)  =  7r(X,)  =  7r(o;Xi),  we  conclude  that  any  initial  condition 
OiXjo,  oiVb,  flo  is  indistinguishable  from  02X40, 0:2 Vb,  flo,  for  any  possible  01,02  G  H- 

This  one- dimensional  unobservable  space  is  very  famiUar,  as  we  experience  that  an  object  mov¬ 
ing  in  front  of  us  produces  the  same  impression  of  a  similar  one  which  is  “twice  as  big,  twice  as 
far,  and  moving  twice  as  fast”.  However,  we  may  impose  norm  constraints  on  the  visible  objects 
or  on  the  translational  velocity  in  order  to  get  rid  of  the  scale  factor  ambiguity.  For  example,  if  we 
impose  ||Fo||  =  1,  two  initial  conditions  are  indistinguishable  only  if  ai  =  ±02. 

There  is  stiU  some  information  which  is  hidden  by  the  model  (5):  we  know  that,  if  an  object  is 
visible,  it  must  be  in  front  of  the  observer,  i.e.  Zi  >  0  V  i.  Moreover,  no  points  are  allowed  to  Ue 
on  the  focal  plane  Z  =  0  (plane  at  infinity),  and  therefore  ai  =  02. 

If  we  apply  such  metric  constraints  to  the  locally  unobservable  codistribution,  we  can  reduce  the 
set  of  indistinguishable  states  to  the  trivial  set.  However,  an  appropriate  model  should  include  such 
constraints  explicitly  into  the  state  manifold.  This  may  be  done  at  the  price  of  transforming  the 
state  from  the  Unear  space  IR^  to  the  differentiable  manifold  with  boundary  x  X  X  IR^  [4]. 
We  now  summarize  some  of  the  limitations  of  the  model  (5): 
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9  The  model  is  not  locally  observable.  Metric  constraints  which  make  the  model  observable  are 
not  explicitly  encoded  in  the  state  representation. 

®  Three  levels  of  he  bracketing  are  needed  to  cover  the  observable  part  of  the  state  space. 
Indeed,  we  know  that  it  is  possible  to  estimate  motion  and  structure  from  the  first  derivative 
of  the  projection  of  the  points  (optical  flow)  [34,  27,  15]. 

@  The  model  has  the  property  of  being  “block  diagonal”  with  respect  to  the  structure,  so  that 
the  states  corresponding  to  different  points  are  independent.  Indeed  it  is  strongly  intuitive 
that  the  more  points  are  visible,  the  better  the  perception  of  motion  ought  to  be. 

5  Global  observability:  motion  estimation  as  identification  of  an 
Exterior  Differential  System 

In  this  section  we  describe  an  alternative  model  for  formulating  the  visual  motion  problem  which  has 
been  presented  in  [47]  and  is  related  to  a  motion  representation  first  introduced  bu  Longuet-Higgins 
in  [34]. 

Motion  estimation  is  viewed  as  a  problem  of  identifying  a  system  in  Exterior  Differential  Form  [7] 
with  parameters  on  a  topological  manifold,  called  the  “essential  manifold”  [47].  We  show  that  the 
model  is  globally  observable/identifiable  with  zero  level  of  differentiation  for  any  number  of  visible 
points.  When  more  points  are  available,  the  redundancy  may  be  exploited  in  order  to  reduce  the 
effect  of  the  measurement  noise. 

5.1  The  “essential  model” 

Consider  a  point  in  3D  space,  with  coordinates  Xi(f)  in  the  viewer’s  reference.  Let  Xi(f  +  r)  be 
the  coordinates  after  a  rigid  motion  of  the  viewer  {T,R),  of  which  (F,  fi)  are  the  canonical  screw 
coordinates  [39]  as  in  equations  (1,2)^. 

It  is  immediate  to  see  that  Xj(f),  Xi(t  +  r)  and  T  are  coplanar,  and  hence  their  triple  product 
is  zero.  Once  expressed  in  a  common  reference,  for  example  the  viewer’s  at  time  t,  the  coplanarity 
constraint  becomes  [34] 

Xj{t  +  T)RTAXi{t)  =  0  yi=l:n. 

Note  that  the  same  relationship  holds  for  Xj(f)  and  Xi{t  +  r),  since  they  represent  the  projective 
coordinates  of  Xi(t)  and  Xi{t  +  r);  TA  €  5o(3)  is  a  skew  symmetric  matrix.  After  defining  the 
essential  matrix  as  Q  =  R{Ta),  the  essential  constraint  becomes 

xf  (t  +  r)Qxi(f)  =  0  V  i  =  1  :  n.  (6) 

Since  there  is  an  arbitrary  scale  factor  in  the  above  equality,  we  impose  ||Q||  =  ||r||  =  1.  It  has 
been  shown  [34,  15,  50]  that  this  constraint  is  not  only  a  consequence  of  a  rigid  motion,  but  it  also 
characterizes  it,  in  the  sense  that  a  sufficient  number  of  constraints  (6)  allows  us  to  characterize 
rigid  motion  [34].  The  essential  matrix  was  first  introduced  by  Longuet-Higgins  [34],  together 
with  a  quasi-linear  batch  technique  for  estimating  structure  and  motion  from  two  views  and  more 
than  8  visible  points.  His  technique  was  then  extended  and  developed  in  [50,  19,  15].  In  [43] 

^Note  that  T  in  this  section  differs  from  the  one  used  in  the  previous  section.  Rigid  motion  is  represented  here  as 
X(f  +  r)  =  R(X(t)  —  T),  for  consistency  with  the  notation  of  [34]. 
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the  algorithm  is  implemented  using  three  pipelined  Singular  Value  Decompositions.  Finally  [47] 
proposes  a  recursive  version  of  the  motion  estimation  technique  based  on  the  essential  constraint. 
The  essential  matrices  are  points  of  the  space 

E  =  {RS\R  e  50(3)  ,  S  =  TAe  so(3)  |  ||r||  =  1} 

which  has  the  structure  of  an  algebraic  variety  [15].  We  now  show  that  a  shght  modification  of  E 
is  a  topological  manifold  of  class  at  least  0°. 

Theorem  5.1  Let  be  the  triangulation  function^,  which  gives  the  depth  of  a  point  from 

its  motion  Q  and  its  projective  coordinates  .  Then  E  =  is  a  topological  manifold 

of  class  at  least  Cq. 

Proof: 

E  inherits  the  topology  from  IR®.  Consider  the  map 

#  :  £  ^  X  ~  IR®  (7) 


'  T  ' 

±V.3 

a 

_  VRz{±^)V^  _ 

where  U,V  are  defined  by  the  Singular  Value  Decomposition  (SVD)  [18]  of  Q  =  USV^,  V.3 
denotes  the  third  column  of  V  and  is  a  rotation  of  |  about  the  Z  axis.  As  usual  0  is  the 

rotation  3-vector  corresponding  to  the  3x3  rotation  matrix  Rz{^yv'^  and  is  obtained  using  the 
Rodrigues’  formulae  [39],  which  give  in  fact  a  local  representation  of  50(3)'^.  T  is  represented  in 
spherical  coordinates.  Note  that  the  map  #  defines  the  local  coordinates  of  the  essential  manifold 
modulo  a  sign  in  the  direction  of  translation  and  in  the  rotation  angle  of  Rz,  therefore  the  map 
#  associates  to  each  element  of  the  essential  manifold  4  distinct  points  in  local  coordinates.  This 
ambiguity  can  be  resolved  by  imposing  the  “positive  depth  constraint”,  i.e.  that  each  visible  point 
lies  in  front  of  the  observer  [34,  35].  Consider  one  of  the  four  local  counterparts  of  Q  £  E,  and  the 
function  dx,x>  ■  E  IR^"^^  defined  by 

dx,x'iQ)  =  [Z,Zf  (8) 

with  Z  =  Vi  =  1, . . . ,  n,  m’  =  (J?x*)  A  x'®  and  n*  =  (RT)  A  x'®,  which  gives  depth 

of  each  point  as  a  function  of  the  projection  and  the  motion  parameters®.  Note  that  it  is  locally 
smooth  away  from  zero  translation.  Now  redefine  the  essential  space  as 

E  =  En  d;i,(]R|)"  =  {Q  =  RS\R  e  50(3),  S  =  TAe  so(3),  ||T||  =  l,d^i^^4Q)  >  0  Vi  =  l..n} 

where  1R+  is  the  positive  open  half  space  of  IR,  denotes  the  preimage  of  dx^x'-  Consider  $ 
restricted  to  E.  It  follows  from  the  properties  of  the  SVD  that  #  is  continuous,  and  furthermore 
it  is  bijective.  It  can  be  shown  (see  appendix  B)  that  Q  6  jF  O  S  =  diag{l  1  0}  and  hence  the 

®See  equation  (8)  in  the  proof  for  an  instance  of  realization  of  the  triangulation  function. 

*We  use  a  hybrid  representation,  with  T  being  the  discrete  translation  between  two  successive  samples,  while  O  the 
instantaneous  rotational  velocity  which  corresponds  to  R.  Note  that  (V,  fl)  and  (T,  R)  are  related  via  the  exponential 
map  of  equations  (1,2). 

*Note  that  when  the  triangulation  is  computed  for  a  large  number  of  points,  a  “Total  Least  Squares”  solution  for 
Z  is  most  appropriate  [18] 
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subspaces  <  V.i,  V.2  >  and  <  U,i,U.2  >  are  allowed  to  switch.  This  happens,  however,  without 
affecting  continuity  of  T  and  fl.  The  inverse  map  is  simply 


X 

'  T  ‘ 


E 

g(nA)(yA). 


which  is  smooth.  Hence  £  is  a  topological  manifold  of  class  at  least  Cq. 


Q.E.D. 


Remark  5.1  Note  that  the  essential  manifold  is  defined  independently  of  the  existence  of  observed 
points,  as  a  compact  representation  of  SE{3),  the  former  being  embedded  in  a  linear  space  of  smaller 
dimension.  However,  in  order  E  to  qualify  as  a  manifold,  we  need  to  take  into  account  at  least 
one  visible  point.  This  is  done  by  imposing  the  “positive  depth  constraint”,  i.e.  imposing  that  the 
visible  points  have  positive  depth  in  both  the  coordinate  frames  at  time  t  and  t  +  t. 


If  we  let  r  — 0  in  the  above  construction,  under  the  small  acceleration  assumption  we  may  write 
the  essential  constraint  as 

r  xf  Qxi  =  0  QeE  Vi  =  1  ;  n 

iyi  =  X;  +  l>i, 

which  represents  a  system  in  exterior  differential  form  [7]:  f{x)dx  =  0,  with  f{x)  =  (Q^x)^. 
Hence  motion  estimation  may  be  viewed  as  identification  of  a  linear  exterior  differential  system, 
with  parameters  on  the  essential  manifold.  Note  that  observability /identifiability  of  the  above 
model  is  independent  of  the  euclidean  structure,  for  it  depends  solely  on  the  projective  coordinates 
of  the  points.  Note  also  that  the  role  of  structure  (depth)  is  to  allow  us  choosing  one  of  the  four 
branches  of  the  local  coordinates  chart.  This  needs  to  be  done  only  at  one  time  step,  and  then 
propagated  across  time. 


5.2  Observability  for  N  >8  points 

Since  the  essential  constraint  is  linear  in  Q,  it  is  possible  to  write  it  using  the  (improper)  notation 

x(x,x)Q  =  0 

where  y  is  a  n  X  9  matrix  and  Q  is  interpreted  as  a  nine-vector  obtained  by  stacking  the  columns 
of  Q  on  top  of  each  other.  The  generic  row  of  x  is  [xix[  X2x[  x[  X1X2  X2X2  x'2  xi  X2  1  ].  We  write 
X  =  [xi . .  .Xnf. 

Following  the  track  of  the  previous  sections,  we  will  assume  small  acceleration  or  a  statistical 
model  for  motion  which,  lifted  to  the  essential  manifold,  results  in  a  statistical  model  for  Q.  The 
resulting  model  has  the  form 

[q  =  /(Q,^^q)  QeE 

\x(x,x)Q  = 

where  /  is  either  zero  or  some  statistical  model;  i'q  and  are  noise  processes  which  can  be  charac¬ 
terized,  as  discussed  in  [48].  In  [47],  two  recursive  schemes  are  proposed  for  solving  the  estimation 
problem:  one  is  based  upon  an  Implicit  Extended  Kalman  Filter  (lEKF)  in  the  local  coordinates 
of  the  essential  manifold,  the  other  is  based  upon  a  hnear  update  on  the  Unear  embedding  space 
IR®,  followed  by  a  projection  onto  the  essential  manifold. 
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Now  consider  x-  if  **  rank  8,  then  there  exists  a  unique  Q  which  spans  its  null  space  modulo 
a  sign,  since  we  have  imposed  a  constraint  on  its  norm.  This  generates  four  distinct  points  in  the 
local  coordinates  which  reduce  to  a  single  solution  once  the  positive  depth  constraint  is  imposed. 
Once  this  is  done  at  one  step,  we  choose  a  branch  of  the  local  coordinates  map  (which  then  becomes 
a  local  homeomorphism)  and  stick  with  it  for  the  subsequent  time  steps  [34,  50,  15]. 

We  are  led  naturally  to  the  following  definition: 

Definition  5.1  We  say  the  points  x  are  in  general  position  ranfc(x(x,x))  =  8. 

Note  that  the  general  position  condition  also  depends  on  motion.  The  following  are  tautologies 
which  come  as  a  direct  consequence  of  the  definitions: 

Claim  5.1  The  essential  model  is  observable/identifiable  modulo  a  sign  under  general  position 
conditions. 

Claim  5.2  If  an  essential  model  is  in  general  position  then  it  is  possible  to  reconstruct  the  motion 
(V,  fl)  of  the  viewer  modulo  four  solutions.  The  solution  is  unique  once  the  positive  depth  constraint 
is  imposed  at  one  time  instant. 

We  stiU  have  to  address  the  issue  of  the  conditions  under  which  the  matrix  x  has  full  rank. 
Furthermore  we  need  to  deal  with  the  case  of  less  than  8  visible  points,  since  it  automatically 
excludes  general  position  conditions. 

5.3  Observability  with  less  than  8  points 

When  less  than  8  points  are  visible,  it  is  not  possible  to  achieve  the  above  sufficient  conditions  for 
motion  observability.  Suppose  that,  at  time  t  +  Ti,  the  matrix  x(^  +  '^i)  has  a  null  space  of  dimension 
ki.  If  the  viewer  moves  with  constant  velocity,  then  we  may  write 

x(t)Q(f)  =  0 

X{t  +  Ti)Q{t  +  Ti)  =  x(t  +  n)Q{t)  =  0 


Xit  +  Tp)Q{t  +  Tp)  =  x{t  +  7v)Q(f)  =  0 


until  ko  +  ki  +  . . .  +  kp  =  8.  If  this  happens,  we  may  go  back  to  the  previous  case  and  restate  the 
sufficient  conditions  for  motion  observability  for  the  extended  matrix 


xit) 

Xit  +  Ti) 


xit  +  Tp) 


Of  course  this  may  not  happen.  As  a  consequence  of  the  above  observations,  we  redefine  general 
position  as  follows: 


Definition  5-2  We  say  an  essential  model  is  in  general  position  (GP)  when  either  there  are  more 
than  8  visible  points  and  x  hos  rank  8,  or  there  exists  a  time  instant  Tp  such  that  the  extended 
matrix  Xp  has  rank  8. 

Then  we  may  restate  the  sufficient  conditions  for  observability  as  a  new  tautology 


Theorem  5.2  If  an  essential  model  is  in  general  position,  then  it  is  globally  observable. 
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5.4  General  position;  rank  condition  for  global  observability  of  rigid  motion 

We  are  now  interested  in  writing  explicitly  the  general  positions  condition  and  promote  the  previous 
tautologies  to  true  claims.  This  is  done  using  results  in  [35,  15,  50]  for  the  case  of  more  than  8 
points.  The  claim,  extended  to  our  general  position  condition,  may  be  stated  as: 

Theorem  5.3  An  essential  model  is  in  general  position  there  does  not  exist  a  (proper) 

quadric  surface^  in  IR^  which  contains  all  the  visible  points  and  the  path  of  the  center  of  projection. 

Remark  5.2  We  report  here  a  proof  given  by  Mennucci  [38]  for  the  case  of  more  than  8  visible 
points.  See  also  [35].  The  general  case  is  obtained  by  substituting  Xt+n  for  x,-.  Note  that  the 
quadric  surface  is  a  thin  set  in  the  3D  euclidean  space.  The  measurement  noise  in  the  projected 
coordinates  is  sufficient  to  set  the  model  in  general  position.  Note  also  that  T  ^  0  plays  a  critical 
role  in  achieving  global  observability,  while  O  (or  R)  is  uninfluent. 

Proof: 

Let  T  7^  0.  Consider  the  points  to  be  fixed  in  an  intermediate  reference  system,  where  their 
coordinates  are  (X")  such  that  X[  =  i?(X"  -  T),  X*-  =  il^(X"  +  T);  then  XJ^QX^  =  0  1  <  *  <  u, 
and  the  same  holds  for  x  in  place  of  X. 

By  substitution  we  get 

xf  Qxi  =  [i?(x''  -  T)fqR'^{x'l  +  T)  =  (x[/  -  TfR'^QR^ix'l  +  T)  =  0  1  <  i  <  n  (9) 

We  may  change  the  variable  in  this  equation  to  be  Q'  =  R^QR^]  since  R  is  invertible,  this  would 
not  change  any  of  the  considerations  below.  Note  that  this  implies  that  there  is  no  dependency  on 
rotation;  we  will  therefore  assume  R  =  I  without  loss  of  generality. 

Equation  (9)  becomes 

(x'Z  -  r)Q(xf  +  r)  =  xf^Qxf  -  T^Qxf  +  xf^QT  -  r^QT  =  0  l<i<n  (10) 

Call  <  Q  >=  {Q  G  I  (xf  -  r)^Q(x"  +  T)  =  0,  1  <  *  <  n};  <  Q  >  is  a  vector  subspace 

of  and  the  fact  that  there  is  only  one  solution  is  equivalent  to  saying  that  the  dimension  of 

<  Q  >  is  one;  indeed,  dim(<  Q  >)  is  always  bigger  or  equal  than  one,  since  it  contains  the  matrix 
TA,  as  can  be  seen  by  direct  substitution  in  eq.  (10). 

Suppose  that  the  equation  (9)  holds  for  a  matrix  M,  and  decompose  it  in  the  symmetric  and 
antisymmetric  part  A  =  ^  g  —  M+Ml.  ^ 

xfSx'(  -  2T'^Ax'l  -  T^ST  =  0  1  <  i  <  n. 

Consider  the  set  <  V  >=  {x  G  R^  |  x^Sx  —  2T^ Ax  —  T^ ST  =  0}.  This  set  always  contains  the 
two  points  T  and  —T,  the  centers  of  projection  (as  a  simple  computation  shows). 

Suppose  there  is  no  (proper)  quadric  surface  containing  the  points  xf;  then  it  must  be  that  V  =  R^, 
that  means  that  5  =  0  and  T^ A  =  0;  this  means  that  M  is  necessarily  a  multiple  of  TA  =  Q,  so 
we  get  that  dim(<  Q  >)  =  1. 

Viceversa,  suppose  that  the  symmetric  part  5  of  M  is  nonzero  or  that  T"^ A  0:  then  the  set 

<  F  >  is  a  quadric  surface  that  contains  the  points  x"  (by  definition),  and  it  contains  the  points 

T  and  — T;  the  latest  are  the  two  centers  of  projection  (if  the  symmetric  part  5  =  0,  then  the  set 
{x  G  R^  I  T'^Ax  =  0}  is  a  plane,  that  is  anyway  a  quadric  surface).  Q.E.D. 

®A  quadric  surface  is  a  set  {x  €  |  x'Ax  +  6‘x  +  c  =  0}  where  A  is  a  3  x  3  matrix,  6  is  a  3— vector  and  c  is  a 

scalar.  It  is  proper  if  it  is  a  proper  subset  of  IR®. 
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6  Conclusions 


We  have  analyzed  the  observability  of  rigid  motion  under  projection.  The  model  which  defines  the 
problem  for  feature  points  in  the  euclidean  3D  space  lacks  of  local  observability.  The  observable 
manifold  is  covered  with  three  levels  of  lie  differentiation.  The  problem  is  indeed  observable  once 
formulated  in  the  appropriate  topological  space. 

We  have  studied  a  formulation  of  visual  motion  estimation  in  terms  of  identification  of  an 
Exterior  Differential  System  with  parameters  on  the  essential  manifold  [47].  The  model  is  globally 
observable /identifiable  with  zero  level  of  bracketing;  when  more  points  are  available,  redundancy 
may  be  exploited  to  reduce  the  effect  of  measurement  noise. 
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A  Notation 

In  this  section  we  introduce  some  notation,  referring  to  [40,  45,  33,  32,  25],  for  the  system: 

(  X  =  f{x,u)  ]  x{to)  -  xo  ,  . 

{y  =  Kx) 

where  x  £  N  C  IR",  some  n-dimensional  manifold,  u  £  M  C  IR™  and  y  £  P  C  IR*’;  it  is  assumed 
that  /  and  h  are  smooth  functions.  The  set  of  admissible  inputs  is  described  &sU  =  {u  :  JR"*" 

P  C  IR*’}  such  that 

1.  W  is  closed  under  concatenation 

2.  /  describes  a  family  of  vector  fields  parametrized  by  it  6  P. 

3.  u  are  piecewise  constant  functions  which  are  piecewise  continuous  from  the  right: 

^(0  ~  =  [^1  +  . . .  +  +  •  •  •  +  4)  I  Ui  £  P  G  IR^  ,  Vt}. 

We  call  fi  =  f{x,Ui)\  in  the  time  interval  !{  the  system  evolves  along  the  integral  curve  of  The 
above  assumptions  may  be  partially  released;  however,  they  are  general  enough  for  our  purposes. 
In  studying  the  visual  motion  problem,  we  will  be  mostly  concerned  with  the  autonomous  case: 
u{t)  =  0  Vf. 

Definition  A.l  x\  and  X2  are  said  to  be  indistinguishable  (and  denoted  with  X1IX2)  €  ti , 

h(4>t(xi,  u))  =  h(<f)t(x2,  u))  yt  >  0. 

I(x)  =  {xi  I  Xilx  ;  X  £  N}  is  the  set  of  states  which  are  indistinguishable  from  x. 

Definition  A. 2  (*)  is  completely  observable  (C-0)  at  x  ^  I(x)  —  {x}. 

(*)  is  completely  observable  A  it  is  C-0  at  x  M  x  £  N . 

Definition  A. 3  Given  an  open  set  U  C  N,  xi  and  X2  are  said  to  be  U -indistinguishable  (and  de¬ 
noted  with  x  11^x2)  ^  {^t{xi,u)  £  U,(f>t{x2,u)  £Uyt£  [to,tl]}  =>  h{(j)t{xi,u))  =  h{(j)t{x2,u))  Vt  £ 
[^0,  ^ij- 

I^{x)  =  {xi  I  xj^x}  is  the  set  of  states  which  are  U-indistinguishable  from  x . 

Definition  A.4  (*)  is  said  to  be  locally  weakly  observable  (L-W-0)  at  x  A  3U,x  £  U  |  VF  C 
U,x£V,  I^ix)  =  {a:}. 

(*)  is  said  to  be  locally  weakly  observable  ^  it  is  L-W-0  at  x  ^x  £  N . 

Definition  A. 5  The  observability  space  O  for  (+)  is  defined  to  be  the  smallest  subspace  of  C°^{N) 
which  contains  the  functions  {hi . .  .hp}  and  is  invariant  under  lie  differentiation  along  vector  fields 
in  T  =  {fi  =  f{x,Ui)}. 


Definition  A. 6  The  observability  codistribution  is  defined  as 

d(D  =  {dX  \  X£  O} 

The  observability  codistribution  is  the  smallest  codistribution  which  is  invariant  for  (*)  and  contains 
the  forms  dh.  It  can  be  shown  that  the  definition  does  not  change  if  we  allow  the  vector  fields  in  r 
to  belong  to  the  accessibility  algebra,  which  consists  of  repeated  lie  brackets  of  vector  fields  in  r. 
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Definition  A. 7  A  system  is  said  to  satisfy  the  observability  rank  condition  (ORC)  at  p  ^  dim{dO)p 

n. 

Remark  A.l  The  ORC  can  be  stated  in  terms  of  exterior  differential  systems.  In  fact  we  may 
interpret  the  observability  codistribution  as  a  Pfaffian  system  [7] 

dO  =  dh^-  dLjh  +  . . .  +  dL^j~^h 

where  f  =  f{',u),  n  is  the  dimension  of  the  state-space  manifold  N .  The  observability  rank  condi¬ 
tion  may  be  state  as: 

Definition  A. 8  . 

The  system  (*)  satisfies  the  observability  rank  condition  at  p  dOp  =  TfN 
Theorem  A.l  If  dim  {O)  =  n  at  p,  then  (+)  is  locally  weakly  observable  in  a  neighborhood  of  p. 
Proof: 

see  [41,  28,  9]  This  condition  is  not  necessary  [9];  however,  the  following  result  holds: 

Theorem  A. 2  If  O  has  constant  dimension  and  the  system  (*)  is  locally  weakly  observable,  then 
rank  (O)  =  n. 

B  Characterization  of  the  essential  space 

The  essential  space  has  been  defined  in  section  5;  in  this  appendix  we  show  a  simple  characterization 
which  is  due  to  Faugeras  and  Maybank  [15,  37]. 

Theorem  B.l  . 

Let  Q  =  U'SV'^  be  the  SVD  of  an  element  Then 

QeE^i:  =  'SQ  =  diag{X  A  0}  j  A  €  II+. 


Proof: 

(=^)  let  Q  =  RS\R  €  50(3),  S  G  50(3);  <y(Q),  the  set  of  singular  values  of  Q,  is  such  that  a(Q)  = 
\J(t{QQ'^).  Next  observe  that  QQ'^  =  RSS^R^  =  55^  =  -5^.  Also  V5  €  so(3)3lT  (  S  = 
(TA),  and  the  singular  values  of  5^  are  {0,  l|T|p,  ||r|p}.  Hence  if  Q  e  E,  it  has  two  equal 
singular  values  and  a  zero  singular  value. 

{<=)  let  Q  =  U'SqV'^  for  some  orthonormal  U,V  and  for  some  A.  Let  furthermore  Rzi^)  ^ 
rotation  of  about  the  Z  axis,  then 

Q  =  t/SoF^  =  URz{~fV^VRzi'^)J:oV^. 

Now  call  R  =  URz{^)^V^  and  5  =  VRz{^)'EoV^;  it  is  immediate  to  see  that  RR^  = 
RfR  =  I3  and  =  —S,  hence  the  claim.  Q.E.D. 
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